Goto

Collaborating Authors

 conditional chain mapping


Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals

Neural Information Processing Systems

Neural sequence-to-sequence models are well established for applications which can be cast as mapping a single input sequence into a single output sequence. In this work, we focus on one-to-many sequence transduction problems, such as extracting multiple sequential sources from a mixture sequence. We extend the standard sequence-to-sequence model to a conditional multi-sequence model, which explicitly models the relevance between multiple output sequences with the probabilistic chain rule. Based on this extension, our model can conditionally infer output sequences one-by-one by making use of both input and previously-estimated contextual output sequences. This model additionally has a simple and efficient stop criterion for the end of the transduction, making it able to infer the variable number of output sequences. We take speech data as a primary test field to evaluate our methods since the observed speech data is often composed of multiple sources due to the nature of the superposition principle of sound waves. Experiments on several different tasks including speech separation and multi-speaker speech recognition show that our conditional multi-sequence models lead to consistent improvements over the conventional non-conditional models.


Review for NeurIPS paper: Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals

Neural Information Processing Systems

Weaknesses: Generality: The idea is not as general as has been presented. It is quite similar to techniques like multisource decoding [1], where the decoder of a network is conditioned on multiple input sequences, and deliberation models [2]. Furthermore, if one uses attention based models that directly model p(Y X), conditioning multiple sequences is quite straightforward, and not uncommon. For example, if there are multiple sequences Y1, Y1, by concatenating Y1 and Y2, one can model p(Y1, Y2 X) P(Y1 X) P(Y2 Y1, X), which naturally conditions on both input and prior sequences (see, e.g., [4]). The novelty lies mostly in how this is being applied to multitalker separation tasks, or specifically, multiple tasks from the same domain where order of the output doesn't matter much.


Review for NeurIPS paper: Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals

Neural Information Processing Systems

All reviewers agree that the paper is an interesting contribution (a conditional chain model), for an important problem (multi-sequence problem, with application to ASR). There was concerns about the experimental section on the weak side, as well as some unclear points. However, reviewers found the rebuttal convincing enough and raised their scores accordingly.


Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals

Neural Information Processing Systems

Neural sequence-to-sequence models are well established for applications which can be cast as mapping a single input sequence into a single output sequence. In this work, we focus on one-to-many sequence transduction problems, such as extracting multiple sequential sources from a mixture sequence. We extend the standard sequence-to-sequence model to a conditional multi-sequence model, which explicitly models the relevance between multiple output sequences with the probabilistic chain rule. Based on this extension, our model can conditionally infer output sequences one-by-one by making use of both input and previously-estimated contextual output sequences. This model additionally has a simple and efficient stop criterion for the end of the transduction, making it able to infer the variable number of output sequences.